Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks

نویسندگان

  • David Saad
  • Sara A. Solla
چکیده

We consider the problem of on-line gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Learning in layered neural networks refers to the modification of internal parameters {J} which specify the strength of the interneuron couplings, so as to bring the map fJ implemented by the network as close as possible to a desired map 1. The degree of success is monitored through the generalization error, a measure of the dissimilarity between fJ and 1. Consider maps from an N-dimensional input space e onto a scalar (, as arise in the formulation of classification and regression tasks. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. Information about the desired map i is provided through independent examples (e, (1'), with (I' = i(e) for all p . The examples are used to train a student network with N input units, K hidden units, and a single linear output unit; the target map i is defined through a teacher network of similar architecture except for the number M of hidden units. We investigate the emergence of generalization ability in an on-line learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time. In this paper we limit our discussion to the case of the soft-committee machine [2], in which all the hidden units are connected to the output unit with positive couplings of unit strength, and only the input-to-hidden couplings are adaptive. *[email protected] tOn leave from AT&T Bell Laboratories, Holmdel, NJ 07733, USA Dynamics of On-line Gradient Descent Learning for Multilayer Neural Networks 303 Consider the student network: hidden unit i receives information from input unit r through the weight hr, and its activation under presentation of an input pattern ~ = (6,· .. ,~N) is Xi = J i .~, with J i = (hl, ... ,JiN) defined as the vector of incoming weights onto the i-th hidden unit. The output of the student network is a(J,~) = L:~l 9 (Ji . ~), where 9 is the activation function of the hidden units, taken here to be the error function g(x) == erf(x/V2), and J == {Jdl<i<K is the set of input-to-hidden adaptive weights. Training examples are of the form (e, (Il) . The components of the independently drawn input vectors ~Il are un correlated random variables with zero mean and unit variance. The corresponding output (Il is given by a deterministic teacher whose internal structure is the same as for the student network but may differ in the number of hidden units. Hidden unit n in the teacher network receives input information through the weight vector Bn = (Bnl , ... , BnN), and its activation under presentation of the input pattern e is Y~ = Bn . e. The corresponding output is (Il = L:~=l 9 (Bn ·e). We will use indices i,j,k,l ... to refer to units in the student network, and n, m, ... for units in the teacher network. The error made by a student with weights J on a given input ~ is given by the quadratic deviation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

An adaptive back-propagation algorithm is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework , both numerical studies and a rigorous analysis show that the adaptive back-propagation method results in faster training by breaking the symmetry bet...

متن کامل

Dynamics of Batch Learning in Multilayer

This paper investigates the dynamics of batch learning in multilayer neural networks. First, we present experimental results on the behavior in the steepest descent learning of multilayer perceptrons and linear neural networks. From the results of both models, we see that strong overtraining, the increase of generalization error, occurs in overrealizable cases where the target function is reali...

متن کامل

Dynamics of Learning Near Singularities in Layered Networks

We explicitly analyze the trajectories of learning near singularities in hierarchical networks, such as multilayer perceptrons and radial basis function networks, which include permutation symmetry of hidden nodes, and show their general properties. Such symmetry induces singularities in their parameter space, where the Fisher information matrix degenerates and odd learning behaviors, especiall...

متن کامل

Effect of Batch Learning in Multilayerneural

This paper discusses batch gradient descent learning in mul-tilayer networks with a large number of statistical training data. We emphasize on the diierence between regular cases, where the prepared model has the same size as the true function , and overrealizable cases, where the model has surplus hidden units to realize the true function. First, experimental study on multilayer perceptrons an...

متن کامل

Designing stable neural identifier based on Lyapunov method

The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995